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Abstract — This paper addresses the problem of sparsity penal- 
ized least squares for applications in sparse signal processing, 
e.g. sparse deconvolution. This paper aims to induce sparsity 
more strongly than LI norm regularization, while avoiding non- 
convex optimization. For this purpose, this paper describes the 
design and use of non-convex penalty functions (regularizers) 
constrained so as to ensure the convexity of the total cost function, 
F, to be minimized. The method is based on parametric penalty 
functions, the parameters of which are constrained to ensure 
convexity of F. It is shown that optimal parameters can be 
obtained by semideflnite programming (SDP). This maximally 
sparse convex (MSC) approach yields maximally non-convex 
sparsity-inducing penalty functions constrained such that the 
total cost function, F, is convex. It is demonstrated that iterative 
MSC (IMSC) can yield solutions substantially more sparse than 
the standard convex sparsity-inducing approach, i.e., LI norm 
minimization. 

I. Introduction 

In sparse signal processing, the £\ norm has special sig- 
nificance [4], [5]. It is the convex proxy for sparsity. Given 
the relative ease with which convex problems can be reU- 
ably solved, the (.\ norm is a basic tool in sparse signal 
processing. However, penalty functions that promote sparsity 
more strongly than the £i norm yield more accurate results in 
many sparse signal estimation/reconstruction problems. Hence, 
numerous algorithms have been devised to solve non-convex 
formulations of the sparse signal estimation problem. In the 
non-convex case, generally only a local optimal solution can 
be ensured; hence the initialization and the specification of 
algorithm parameters become important, as does the particular 
choice of non-convex penalty function. 

This paper aims to develop an approach that promotes 
sparsity more strongly than the £i norm, but which attempts to 
avoid, as far as possible, complications arising in non-convex 
optimization. In particular, the paper addresses iU-posed linear 
inverse problems of the form 

JV-l 

arg min |f(x) = ||y - HxHl -|- A„0„(a;„)| (1) 

n— 

where A„ > and ^„ : M — >^ M are sparsity-inducing 
regularizers (penalty functions) for n e Zjv = {0, . . . , A''— 1}. 
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Problems of this form arise in denoising, deconvolution, 
compressed sensing, etc. 

This paper explores the use of non-convex penalty functions 
(j)n, under the constraint that the total cost function F is 
convex and therefore reliably minimized. To this end, we 
employ penalty functions parameterized by variables a„, i.e., 
4>n{x) = (j){x; an), wherein the parameters a„ are selected so 
as to ensure convexity of the total cost function F. A key 
idea of the proposed approach is that the parameters a„ can 
be optimized to make the penalty functions 0„ maximally 
non-convex (i.e., maximally sparsity-inducing), subject to the 
constraint that F is convex. We refer to this as the 'maximally- 
sparse convex' (MSC) approach. The allowed interval for 
the parameters a„, to ensure F is convex, is obtained by 
formulating a semidefinite program (SDP) [2], which is itself 
a convex optimization problem. Hence, in the proposed MSC 
approach, the cost function F to be minimized depends itself 
on the solution to a convex problem. This paper also describes 
an iterative MSC (IMSC) approach that boosts the appUcability 
and effectiveness of the MSC approach. 

The proposed MSC approach requires a suitable parametric 
penalty function ; a), where a controls the degree to which 
(f) is non-convex. Therefore, this paper also addresses the 
choice of parameterized non-convex penalty functions so as 
to enable the approach. The paper proposes suitable penalty 
functions (f) and describes their relevant properties. 

A. Related Work (Threshold Functions) 

When H in (1) is the identity operator, the problem is one of 
denoising and is separable in a;„. In this case, a sparse solution 
X is usually obtained by some type of threshold function, 
: R M. The simplest and perhaps most widely used 
threshold functions are the soft and hard threshold functions 
[19]. Each has its disadvantages, and many other thresholding 
functions that provide a compromise of the soft and hard 
thresholding functions have been proposed - for example: the 
firm threshold [28], the non-negative (nn) garrote [24], [27], 
the SCAD threshold function [22], [61], and the proximity 
operator of the non-convex ^p quasi-norm, i.e., 4'{x) = \x\^, 
< p < 1 [39]. Several penalty functions are unified 
by the two-parameter formulas given in [3], [30], wherein 
threshold functions are derived as proximity operators [17]. 
(Table 1.2 of [17] lists the proximity operators of numerous 
functions.) Further threshold functions are defined directiy by 
their functional form [58]-[60]. 

The threshold function used in this paper is developed as a 
proximity operator (i.e., via a penalty function, cf), or rather. 
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via the derivative of a penalty function, </)'), and is designed so 
as to have the three properties advocated in [22]: unbiasedness 
(of large coefficients), sparsity, and continuity. Further, the 
threshold function 9 and its corresponding penalty function 
^ are parameterized by two parameters: the threshold T and 
the right-sided derivative of 6 at the threshold, i.e. 9'{T+), 
a measure of the threshold function's sensitivity. Like other 
threshold functions, the proposed threshold function biases 
large \xn\ less than does the soft threshold function, but is 
continuous unlike the hard threshold function. As will be 
shown below, the proposed function is most similar to the 
threshold function (proximity operator) corresponding to the 
logarithmic penalty, but it is designed to have less bias. It is 
also particularly convenient in algorithms for solving (1) that 
do not call on the threshold function directly, but instead call 
on the derivative of penalty function, (p'ix), due to its simple 
functional form. Such algorithms include iterative reweighted 
least squares (IRLS) [33], iterative reweighted £i [10], [57], 
FOCUSS [48], and algorithms derived using majorization- 
minimization (MM) [23] wherein the penalty function is upper 
bounded (e.g. by a quadratic or linear function). 

Sparsity-based nonlinear estimation algorithms can also be 
developed by formulating suitable non-Gaussian probability 
models that reflect sparse behavior, and by applying Bayesian 
estimation techniques [1], [15], [21], [34], [35], [43], [45]. 
A novel approach in this vein is based on codelength min- 
imization [47]. We note that, the approach we take below 
is essentially a deterministic one; we do not explore its 
formulation from a Bayesian perspective. 

B. Related Work (Sparsity Penalized Least Squares) 

Numerous problem formulations and algorithms to obtain 
sparse solutions to the general ill-posed linear inverse problem, 
(1), have been proposed. The ii norm penalty (i.e., 4)n{x) = 
\x\) has been proposed for sparse deconvolution [9], [14], [36], 
[54] and more generally for sparse signal processing [13] and 
statistics [55]. For the £i norm and other non-differentiable 
convex penalties, efficient algorithms for large scale problems 
of the form (1) and similar (including convex constraints) 
have been developed based on proximal splitting methods [16], 
[17], alternating direction method of multipliers (ADMM) [8], 
majorization-minimization (MM) [23], primal-dual gradient 
descent [20], and Bregman iterations [31]. 

Several approaches aim to obtain solutions to (1) that are 
more sparse than the £\ norm solution. Some of these methods 
proceed first by selecting a non-convex penalty function that 
induces sparsity more strongly than the ii norm, and second 
by developing non-convex optimization algorithms for the 
minimization of F; for example, iterative reweighted least 
squares (IRLS) [33], [57], FOCUSS [32], [48], and [11], [42], 
[53]. With the availability of fast reliable algorithms for £i 
norm minimization, reweighted f.i norm minimization is a 
suitable approach for the non-convex problem [10], [57]: the 
tighter upper bound of the non-convex penalty provided by 
the weighted £i norm, as compared to the weighted £2 norm, 
reduces the chance of convergence to poor local minima. Other 
algorithmic approaches include 'difference of convex' (DC) 
programming [29] and operator splitting [12]. 



A distinct approach to obtain sparse solutions to (1) is to 
find an approximate solution minimizing the £0 quasi-norm 
or satisfying an £q constraint. Examples of such algorithms 
include: matching pursuit (MP) and orthogonal MP (OMP) 
[40], greedy £1 [38], iterative hard thresholding (IHT) [6], [7], 
[37], [44], hard thresholding pursuit [26], smoothed £0, [41], 
iterative support detection (ISD) [56], single best replacement 
(SBR) [51], and ECME thresholding [46]. 

In contrast to these works, in this paper the penalties 
are constrained by the operator H. This approach (MSC) 
deviates from the usual approach which chooses the penalty 
not based on the system (or 'channel') H through which 
X is observed, but on prior knowledge of x. We also note 
that, by design, the proposed approach leads to a convex 
optimization problem; hence, it differs from approaches that 
pursue non-convex optimization. It also differs from usual 
convex approaches for sparse signal estimation/recovery which 
utilize convex penalties. In this paper, the aim is precisely to 
utilize non-convex penalties that induce sparsity more strongly 
than a convex penalty possibly can. 

Compared to algorithms aiming to solve the £0 quasi-norm 
problem, the proposed approach again differs. First, the £q 
problem is highly non-convex, while the proposed approach 
defines a convex problem. Second, methods for £q seek the 
correct support (index set of non-zero elements) of x and 
do not regularize (penalize) any element Xn in the calculated 
support.' In contrast, the design of the regularizer (penalty) 
is at the center of the proposed approach, and no Xn is left 
unregularized. 

Once a reasonably sparse solution is obtained, by convex £1 
norm minimization or otherwise, it can be helpful to subse- 
quently perform least squares over only the non-zero elements 
of the obtained solution. This is referred to as 'debiasing' [25], 
and can be used as an optional post-processing step for the 
proposed MSC approach as well. 

II. Scalar Threshold Functions 

The proposed threshold function and corresponding penalty 
function is intended to serve as a compromise between soft 
and hard threshold functions, and as a parameterized family 
of functions for use with the proposed MSC method for ill- 
posed Unear inverse problems, to be described in Sect. III. 

First, we note the high sensitivity of the hard threshold 
function to small changes in its input. If the input is slightly 
less than the threshold T, then a small positive perturbation 
produces a large change in the output, i.e., (/)h(T — e) = 
and 0h(T + e) ss T where (9h : M ^ M denotes the hard 
threshold function. Due to this discontinuity, spurious noise 
peaks/bursts often appear as a result of hard-thresholding 
denoising. For this reason, a continuous threshold function 
is often preferred. The susceptibiUty of a threshold function 
6 to the phenomenon of spurious noise peaks can be roughly 
quantified by the maximum value its derivative attains, i.e., 
maxy^B.d'{y), provided 9 is continuous. For the threshold 
functions considered below, 6' attains its maximum values 

'Some £0 methods, e.g. reweighted £1 techniques, utiUze regularization 
within an algorithm, but the final result is not regularized over its support. 
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at y = ±r+; hence, the value of 9'{T'^) will be noted. 
The soft threshold function 0^ has 9[{T~^) = 1 reflecting 
its insensitivity. However, 6*5 substantially biases (attenuates) 
large values of its input; i.e., 6s{y) = y — T for y > T. 



A. Problem Statement 

In this section, we seek a threshold function and correspond- 
ing penalty (0 for which the 'sensitivity' 6''(T+) can be readily 
tuned from 1 to infinity and (»') that does not substantiaUy bias 
large y, i.e., y — 9{y) decays to zero rapidly as y increases. 

For a given penalty function (p, the proximity operator [17] 
denoted 6* : M M is defined by 



9(y) = argmin 



1 



F{x) = -{y - xY + X<l^{x) 



(2) 



where A > 0. Conmion sparsity-inducing penalties include 

(j){x) = \x\ and (f){x) = - log{l + a \x\) . (3) 

We similarly assume in the following that (j>{x) is differen- 
tiable for all a; e ffi except a; = 0, and that (f) is symmetric, 
i.e., (l){—x) = (f){x). 

If 0{y) = for aU \y\ < T for some T > 0, and T is 
the maximum such value, then the function ^ is a threshold 
function and T is the threshold. 

It is often beneficial in practice if 9 admits a simple func- 
tional form. However, as noted above, a number of algorithms 
for solving (1) do not use 9 directly, but use instead. In that 
case, it is beneficial if (j)' has a simple function form. This is 
relevant in Sec. HI where such algorithms will be used. 

In order that y — 9{y) approaches zero, the penalty function 
4> must be non-convex, as shown by the foUowing. 

Proposition 1. Suppose : M — > M is a convex function and 
9{y) denotes the proximity operator associated with 0, defined 
in (2). If < j/i < t/2, then 



2/1 - ^(2/1) < y2 - ^(2/2) 



(4) 



Proof: Let Ui = 9{yi) for i = 1,2. We have, 

yi€Ui + Xd4>{ui). (5) 

Since j/2 ^ yi, by the monotonicity of both of the terms on 
the right hand side of (5), it follows that U2 ^ Ui. 

If U2 = ui, (4) holds with since 2/2 > 2/i- 

Suppose now that U2 > ui. Note that the subdifferential 
d(j) is also a monotone mapping since ^ is a convex function. 
Therefore it foUows that if Zi G Xd<p{ui), we should have 
Z2 ^ zi. Since yt — 9{yi) e \d(l){ui), the claim foUows. ■ 

According to the proposition, if the penalty is convex, then 
the gap between 9{y) and y increases as the magnitude of y 
increases. The larger y is, the greater the bias (attenuation) 
is. The soft threshold function is an extreme case that keeps 
this gap constant (beyond the threshold T, the gap is equal to 
T). Hence, in order to avoid attenuation of large values, the 
penalty function must be non-convex. 



B. Properties 

Suppose F, defined in (2), is convex and (t){x) is differen- 
tiable for all a; e M except a; = 0. Then the subdifferential dF 
is given by 

^"^^"^^ \[A0'(O-),A0'(O+)]-2/, ifa; = 0. 

Since F is convex, its minimizer x* satisfies e dF{x*). 
From (6), if y G [A</.'(0-), A(^'(0+)], then € 9F(0), and in 
turn X* = 0. Assuming </> is symmetric, 0'(O~) = —0(0+), 
this interval represents the thresholding interval of 9, and the 
threshold T is given by 



T = A0'(O+). 



(7) 



Suppose now that y ^ [A(/)'(0~), A(/)'(0"'")]. This can happen 
if either of the following hold: (i) y > A0'(O+), (ii) y < 
X(j)'{0~). In the sequel, we will study the case (i). The results 
can be extended to (ii) straightforwardly. 

First, note that if y > A0'(O+), then a;* > and it satisfies 



y = x*+X^'{x*). 
Let us define / : K+ ^ R as 

f{x) =x + X(l)'{x). 



(8) 



(9) 



For y > A0'(O+), the threshold function 9 can now be 
expressed as 

0{y) = r\y)- (10) 

Observe that / is continuous and /(0+) = X(j)'{0+) = T. 
In view of (10), this implies that 6'(r+) = 0. Thus, 9{y) is 
continuous at the threshold. 

For a symmetric (p, it can be shown that F is convex if and 
only if / is monotone, i.e., f'{x) > 0,Va; > 0. This in tum is 
equivalent to 

(l)"{x)>-\, yx>o. (11) 

A 

Let us now find the first and second derivatives of 9{y) at 
y = T+. From (10), 

f{0{y)) = y- (12) 
Differentiating with respect to y gives 

fi9{y)) 9'{y) = 1. (13) 
Differentiating again with respect to y gives 

f"{9{y)) [9'{y)f + f'{e{y)) 9"{y) = 0. (14) 
Setting y = T+ in (13) gives 



/'(0+) 



(15) 



Setting y = T+ in (14) gives 

/"(0+) [9\T+)Y + /'(0+) 9"{T+) = (16) 



or 



9"(T+) = — ffl^ 



(17) 
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Using (9), we have 
/'(0+) = l + A</."(0+) and f"{0+) 
Hence 

e'{T+) 



A<^'"(0+). 
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and 



6»"(T+) = - 



1 + A^"(0+) 

A0"'(O+) 



(18) 



(19) 



[l + A,^"(0+)]3- 

Equations (18) and (19) will be used in the following. As 

noted above, 6'{T^) reflects the maximum sensitivity of 9. 
The value 9"{T^) is also relevant; it will be set in Sec. 11-D 
so as to induce 9{y) — y to decay rapidly to zero. 

C. The Logarithmic Penalty Function 

The logarithmic penalty function can be used for the MSC 
method to be described in Sec. III. It also serves as the model 
for the penalty function developed in Sec. II-D below, designed 
to have less bias. The logarithmic penalty is given by 



(j){x) = ^\og{l + a\x\), 0<a< 



(20) 



which is differentiable except at x 
derivative of </> is given by 

(j)'{x) = ^ \ , 
' l + a\x\ 



0. For a; ^ 0, tiie 



sign(a;), x ^ 0, 



(21) 



as illustrated in Fig. la. The function f{x) = x + X(j)'{x) is 
illustrated in Fig. lb. The threshold function 0, given by (10), 
is illustrated in Fig. Ic. 

Let us find the range of a for which F is convex. Note that 

(l)"[x) = - ,^ fora;>0 (22) 



r{x) = 



{1 + axY 
2o2 



for X > 0. 



(23) 



(1 + axY 

Using the condition (11), it is deduced that if ^ a ^ 1/A, 
then f(x) is increasing, the cost function F in (2) is convex, 
and the threshold function 9 is continuous. 

The threshold is given by T = A. To find 9'{T+) and 
e"{T+), note that 

<^"'(0+) = 



(^"(0+) = -a. 
Using (18) and (19), we have 



e'(T+) = — 



1 



e"{T+) = — ^"^•^ 



(25) 



aX' " ' (l-aA)3 

As a varies between and 1/A, the derivative 9'{T~^) varies 
between 1 and infinity. As a approaches zero, 9 approaches the 
soft-threshold function. We can set a so as to specify 9'{T^). 
Solving (25) for a gives 

1 1 



a = 



1 



(26) 



A V' 0'{T+), 

Therefore, T and 9'{T+) can be directly specified by setting 
the parameters A and a (i.e., A = T and a is given by (26)). 

Note that 9"{T~^) given in (25) is strictiy negative except 
when a = which corresponds to the soft threshold function. 
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a = 0.25 
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(24) '^'S- 1- Functions related to the logarithmic penalty function (20). 



The negativity of 9"{T~^) inhibits the rapid approach of 6 to 
the identity function. 

The threshold function 9 is obtained by solving y — f[x) 
for X, leading to 



ax^ + (1 - ay)x + {X - y) = 0, 



(27) 



which leads in turn to the expUcit formula 

M _ J_ _|_ J(M _|_ J_^2 _ A 



sign(y), |y| ^ A 



as illustrated in Fig. Ic. As shown, the gap y—9{y) goes to zero 
for large y. By increasing a up to 1/A, the gap goes to zero 
more rapidly; however, increasing a also changes 6'{T~^). The 
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single parameter a affects both the derivative at the threshold 
and convergence rate to identity. 

The next section derives a penalty function, for which the 
gap goes to zero more rapidly, for the same value of 6 (T+). 
It will be achieved by setting e"{T+) = 0. 

D. The Arctangent Penalty Function 

To obtain a penalty approaching the identity more rapidly 
than the logarithmic penalty, we use equation (21) as a model, 
and define a new penalty by means of its derivative as 

. ^-^^1^ sign(.), a>0,6>0. (28) 

Using (7), the corresponding threshold function 9 has threshold 
T = A. In order to use (18) and (19), we note 

(26a; + a) 
' (6a;2 + ax + 1)2 

= ' - . . fora;>0. 
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2{2hx ■ 



for a; > 



26. (29) 



(30) 



(6.x2 + ax + 1)-'' (6x2 -^ax + 1)2 
The derivatives at zero are given by 

0'(O+) = 1, 0"(O+) = -a, 0"'(O+) = 2a2 
Using (18), (19), and (29), we have 

^ ' 1-Xa' ^ ' (1-Aa)3 ■ 

We may set a so as to specify ^'(T+). Solving (30) for a 
gives (26), the same as for the logarithmic penalty function. 

In order that the threshold function increases rapidly toward 
the identity function, we use the parameter 6. To this end, we 
set 6 so that 9 is approximately linear in the vicinity of the 
threshold. Setting 9"{T+) = in (30) gives 6 = a2. Therefore, 
the proposed penalty function is given by 

(t>'ix) 22/1 111 sign(x). (31) 
a^x^ + a \x\ + i 

From the condition (11), we find that if ^ a ^ 1/A, then 
/(x) = x + \(j)'{x) is increasing, F is convex, and 6 is 
continuous. The parameters a and A, can be set as for the 
logarithmic penalty function; namely T = A and by (26). 

To find the threshold function 9, we solve y = x + X<p'{x) 
for X which leads to 

a^x^ + a{l - \y\ a)x^ + (1 - \y\ a)x + (A - \y\) = (32) 

for \y\ > T. The value of d{y) can be found solving the 
cubic polynomial for x, and multiplying the real root by 
sign(y). Although 9 does not have a simple functional form, 
the function 0' does. Therefore, algorithms such as MM and 
IRLS, which use 4>' instead of 9, can be readily used in 
conjunction with this penalty function. 

The penalty function itself, 4>, can be found by integrating 
its derivative: 

/■kl 

(j){x) = / (j)'{u)du (33) 



atan threshold function (T = 2) 

e(y) = r\y) 
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Fig. 2. The arctangent threshold function for several values of 9'(T'+). 




log 
atan 




(34) 



Fig. 3. Comparison of arctangent and logarithmic penalty functions, both 
with (?'(T+) = 2. The arctangent threshold function approaches the identity 
faster than the logarithmic penalty function. 



We refer to this as the arctangent penalty function. 

The threshold function is illustrated in Fig. 2 for threshold 
T = A 2 and three values of 9'{T+). With A = 2, the 
function F is convex for all a S [0, 1/A]. With 9'{T+) = 1, 
one gets a = and 9 is the soft-threshold function. With 
9'{T~^) = 2, one gets a = 1/4 and 9 converges to the identity 
function. With 9'{T^) = 00, one gets a = 1/2; in this case, 9 
converges more rapidly to the identity fimction, but 9 may be 
more sensitive than desired in the vicinity of the threshold. 
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Figure 3 compares the logarithmic and arctangent threshold 
functions where the parameters for each function are set so 
that T and e'{T+) are the same (specifically, A = T = 2 
and a = 1/4). It can be seen that the arctangent threshold 
function converges more rapidly to the identity function than 
the logarithmic threshold function. To illustrate the difference 
more clearly, the lower panel in Fig. 3 shows the gap between 
the identity function and the threshold function. For the 
arctangent threshold function, this gap goes to zero more 
rapidly. Yet, for both threshold functions, 6' has a maximum 
value of 2. The faster convergence of the arctangent threshold 
function is due to (j>'{x) going to zero like l/x"^, whereas for 
the logarithmic threshold function ^'{x) goes to zero Uke 1/x. 
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E. Other Penalty Functions 

The firm threshold function [28], and the smoothly clipped 
absolute deviation (SCAD) threshold function [22], [61] also 
provide a compromise of hard and soft thresholding. Moreover, 
both are continuous and equal to the identity function for large 
\y\. For both functions, (j)'{x) = for large x. However, some 
forms of algorithms such as IRLS, MM, etc., involve dividing 
by (j)', which then leads to divide-by-zero issues. 

A widely used penalty function is the ip pseudo-norm, < 
p <1, for which cp{x) = |a;|^ . However, using (11), it can be 
seen that for this penalty function, the cost function F(x) is 
not convex for any < p < 1. Therefore, 9 has a discontinuity 
[39], like the hard threshold function, and is Ukewise sensitive 
to small changes in y. As our current interest is in non-convex 
penalty functions for which F is convex, we do not further 
discuss the ip penalty. The reader is referred to [39] for in- 
depth analysis of this penalty fimction. 



F. Denoising Example 

To illustrate the trade-off between 6'{T^) and the bias 
introduced by thresholding, we consider the denoising of the 
noisy signal illustrated in Fig. 4. Wavelet domain thresholding 
is performed with several thresholding functions. 

Each threshold function is applied with the same threshold, 
T = 3(7. Most of the noise (c.f. the 'three-sigma rule') will 
fall below the threshold and will be eliminated. The RMSE- 
optimal choice of threshold is usually lower than 3c7, so this 
represents a larger threshold than that usually used. However, 
a larger threshold reduces the number of spurious noise peaks 
produced by hard thresholding. 

The hard threshold achieves the best RMSE, but the output 
signal exhibits spurious noise bursts due to noisy wavelet co- 
efficients exceeding the threshold. The soft threshold function 
reduces the spurious noise bursts, but attenuates the peaks and 
results in a higher RMSE. The arctangent threshold function 
suppresses the noise bursts, with modest attenuation of peaks, 
and results in an RMSE closer to that of hard thresholding. 

In this example, the signal is 'bumps' from WaveLab [18], 
with length 2048. The noise is additive white Gaussian noise 
with standard deviation a = 0.4. The orthonormal Daubechies 
wavelet with 3 vanishing moments was used for this example. 
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Fig. 4. Denoising via orthonormal wavelet thresholding using various 
threshold functions. 



III. Sparsity penalized least squares 
Consider the Unear model. 



where x G 



Hx + w 



is a sparse A^-point signal, y G 



(35) 



pM 



the observed signal, H G R^^^ is hnear operator (e.g., 
convolution), and w G is additive white Gaussian noise 
(AWGN). The vector x is denoted x = {xq, . . . , Xn-i)'^- 

Under the assumption that x is sparse, we consider the hnear 
inverse problem: 



arg mm 
xeR« 



JV-l 



^(x) = |||y-Hx||i-F ^ Xn(t>{xn;an) 



ra=0 



(36) 



where cp{x; a) is a sparsity-promoting penalty function with 
parameter a, such as the logarithmic or arctangent penalty 
functions. In many applications, all A„ are equal, i.e., A„ = A. 
For generaUty, we let this regularization parameter depend on 
the index n. 

In the following, we address the question of how to con- 
strain the regularization parameters A„ and a„ so as to ensure 
F is convex, even when (p{ ■ ; a„) is not convex. 
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A. Convexity Condition 

Let (l){x; a) denote a penalty function with parameter a. 
Consider the function / : M ^ M, defined as 

1 



f{x) = -X +X(t){x;a). 



(37) 



Assume that, for special choices of A and a, f{x) can be 
made convex. Let us give a name to the set of all such special 
choices. 

Definition 1. Let <S be the set of pairs (A, a) for which f{x) in 
(37) is convex. We refer to <S as the 'parameter set associated 
with (/)'. 

For the logarithmic and arctangent penalty functions de- 
scribed above, the set <S is given by 



S = {(A, a) : A > 0, ^ a ^ 1/A}. 



(38) 



Now, consider the function F : — >■ M, defined in (36). 
The following proposition provides a sufficient condition on 
(A„,a„) ensuring the convexity of F. 

Proposition 2. Suppose R is a positive semi-definite diagonal 
matrix such that H-^H — R is also positive semi-definite. Let 
r„ denote the n-th diagonal entry of R, i.e., [R]n,n — fnl? ^■ 
Also, let <S be the parameter set associated with (j). Then, -F(x) 
in (36) is convex if {Xn/vn, cin) S <5 for each n. 
Proof: The function -F'(x) can be written as 

F(x) = ^x^(H^H - R)x - y^Hx + ^y^y +ff(x), (39) 



9(x) 



where 



^(x) = :^x^Rx -I- ^ XnHXn, an)- 



(40) 



Note that g(x) is convex since H^H — R is positive semi 
definite. Now, since R is diagonal, we can rewrite g{x) as 



5(x) 



^ 2 

n 



Xn\ + Xn4>{Xn; an) 

2 , A. 



Xr, 



(/)(a;„;a„)^, 



(41) 
(42) 



From (42), it follows that if {Xn/rn,an) € S for each n, 
then g{x) is convex. Under this condition, being a sum of two 
functions, it also follows that F{x) is convex. ■ 

The proposition states that constraints on the penalty param- 
eters a„ ensuring convexity of F{x) can be obtained using a 
diagonal matrix R lower bounding H^H. In view of (38), the 
following is a corollary of this result. 

Corollary 1. For the logarithmic and arctangent penalty 
functions, if 



< a„ < 



Xn 



then F{x) in (36) is convex. 



(43) 

□ 



If r/c = for some fc e Zjv, then the corollary guarantees 
the convexity of F if ; a/j) is convex. For the logarithmic 
and arctangent penalty functions, that imphes ak = 0, and 



(/)(• ; Ofe) reduces to the absolute value function. On the other 
hand, if > for some k, then F can be convex even though 
(/)(• ; flfe) is not convex. 

We illustrate condition (43) with a simple example using 
N = 2 variables. We set H = I, y = [9.5,9.5]^, and 
Ao = Ai = 10. Then R = I is a positive diagonal matrix 
with H^H — R positive semidefinite. According to (43), F is 
convex if ^ 0.1, i = 0,1. Figure 5 illustrates the contours 
of the logarithmic penalty function and the cost function F for 
three values of a. For a — 0, the penalty function reduces to 
the £i norm. Both the penalty function and F are convex. For 
a = 0.1, the penalty function is non-convex but F is convex. 
The non-convexity of the penalty is apparent in the figure 
(its contours do not enclose convex regions). The non-convex 
'star' shaped contours induce sparsity more strongly than the 
diamond shaped contours of the £i norm. For a = 0.2, both 
the penalty function and F are non-convex. The non-convexity 
of F is apparent in the figure (a convex function can not have 
more than one stationary point, while the figure shows two). In 
this case, the star shape is too pronounced for F to be convex. 
In this example, o = 0.1 yields the maximally sparse convex 
(MSC) problem. 

How can an optimal R be obtained? Let us denote the 
minimal eigenvalue of H-^H by Qmin- Then R = amini 
is a positive semi-definite diagonal lower bound, as needed. 
However, this is a sub-optimal lower bound in general. For 
example, if H is a non-constant diagonal matrix, then an 
optimal lower bound is H^H itself, which is very different 
from amini in general. A tighter lower bound is of interest 
because the tighter the bound, the more non-convex the penalty 
function can be, while maintaining convexity of F. In tum, 
sparser solutions can be obtained without sacrificing convexity 
of the cost function. A tighter lower bound can be found as 
the solution to an optimization problem, as described in the 
following. 

B. An Optimal Diagonal Lower Bound 

Given H, the convexity conditions above calls for a positive 
semidefinite diagonal matrix R lower bounding H^H. In 
order to find an optimal lower bound, each r„ should be 
maximized. However, these A'' parameters must be optimized 
jointly. We formulate the calculation of R as an optimization 
problem: 



arg max 



n=0 

such that r„ ^ Ofmin 

H^H - R > 



(44) 



where R is the diagonal matrix [R]„,„ = r„. The inequality 
H^H — R ^ expresses the constraint that H^H — R is 
positive semidefinite (all its eigenvalues non-negative). Note 
that the problem is feasible, because R = aminI satisfies the 
constraints. 

Problem (44) can be recognized as a semidefinite optimiza- 
tion problem, a type of convex optimization problem for which 
algorithms have been developed and for which software is 
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x|L,a = 0.0 





||y-Hx||^ + X(Kx,a),a = 0.1 




||y-Hxir + X(t)(x, a), a = 0.2 




Fig. 5. Contour plots of the logarithmic penalty function <j) and cost function 
F for three values of a as described in the text. For a = 0.1, the function F 
is convex even though the penalty function is not. 



available [2]. The cost function in (44) is a linear function 
of the N variables, and the constraints are linear matrix 
inequalities (LMls). To solve (44) and obtain an optimal R, 
we have used the MATLAB software package 'SeDuMi' [52]. 

Often, inverse problems arising in signal processing involve 
large data sets (e.g., speech, EEG, and images). Practical algo- 
rithms must be efficient in terms of memory and computation. 
In particular, they should be 'matrix- free', i.e., the operator 
H is not explicidy stored as a matrix, nor are individual 
rows or columns of H accessed or modified. However, op- 
timization algorithms for semidefinite programming usually 
involve row/column matrix operations and are not 'matrix 
free' . Hence, solving problem (44) will likely be a bottleneck 
for large scale problems. This motivates the development 
of semidefinite algorithms for solving (44) where H is not 
explicitly available, but for which multiplications by H and 
H-^ are fast. 

Nevertheless, for ID problems of 'medium' -size, (44) is 
readily solved via existing software. In case (44) is too compu- 
tationally demanding, then the suboptimal choice R = amini 



can be used. Furthermore, we describe below a multistage 
algorithm whereby the proposed MSG approach is applied 
iteratively. 

C. Optimality Conditions and Threshold Selection 

When the cost function F in (36) is convex, then its 
minimizer must satisfy specific conditions [4, Prop 1.3]. These 
conditions can be used to verify the optimahty of a solution 
produced by a numerical algorithm. The conditions also aid 
in setting the regularization parameters A„. 

If F in (36) is convex, and 4> is differentiable except at zero, 
then X* minimizes F if 



1 



[H^(y - Hx*)]„ = </,'(x„), 



(45) 



[H^(y - Hx*)]„ e [(/>(0-), </)(0+)], < = 



where [v]„ denotes the n-th component of the vector v. 

In the example below, the optimality of the numerically 
obtained solution is illustrated by a scatter plot of [H^(y — 
Hx)]„/A„ versus a;„a„, for n G Zn, in Fig. 7. 

The condition (45) can be used to set the threshold value 
for the penalty function Suppose y follows the model 

(35) where x is sparse. One approach for setting A„ is to ask 
that the solution to (36) be all-zero when x is all-zero in the 
model (35). Note that, if x = 0, then y consists of noise only 
(i.e., y = w). In this case, (45) suggests that A„ be chosen 
such that 



A„ (^(0-) < [H^w]„ < Xn <^(0+), n e Zn. 



(46) 



For the £i norm, logarithmic and arctangent penalty functions, 
(/)(0~) = — 1 and (/)(0+) = 1, so (46) can be written as 



(47) 



However, the larger A„ is, the more Xn will be attenuated. 
Hence, it is reasonable to set A„ to the smallest value satisfying 
(47), namely. 



A„ w max |[H w]„| 



(48) 



where w is the additive noise. Although (48) assumes avail- 
ability of the noise signal w, which is unknown in practice, 
(48) can often be estimated based on knowledge of statistics 
of the noise w. For example, based on the 'three-sigma rule', 
we obtain 

A„«3std([H^w]„). (49) 
If w is white Gaussian noise with variance cr^, then 



std([H^w]„) = <7||H(-,n)||2 



(50) 



where H(-,n) denotes column n of H. For example, if H 
denotes linear convolution, then all colimms of H have equal 
norm and (49) becomes 



A„ = A«3a||h||2 
where h is the impulse of the convolution system. 



(51) 
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D. Usage of Method 

We summarize the forgoing approach, MSG, to sparsity 
penaUzed least squares, cf. (36). We assume the parameters 
A„ are fixed (e.g., set according to additive noise variance). 

1) Input: y e M^, H e K^^^, {A„ > 0, n e Zn}, 
(/) : M X M -)> M. 

2) Find a positive semidefinite diagonal matrix R such that 
H-^H — R is positive semidefinite; i.e., solve (44), or 
use the sub-optimal R = amini- Denote the diagonal 
elements of R by r„, n G Zjy. 

3) For n G Z^r, set a„ such that (r„/A„,a„) G S. Here, 
S is the set such that / in (37) is convex if (A, a) e S. 

4) Minimize (36) to obtain x. 

5) Output: X e K^. □ 
The penalty function <p need not be the logarithmic or arct- 
angent penalty functions discussed above. Another parametric 
penalty function can be used, but it must have the property 
that / in (37) is convex for (A, a) £ S for some set S. Note 
that (t){x,p) = \xf with < p < 1 does not qualify because 
/ is non-convex for all < p < 1. On the other hand, the 
firm penalty function [28] could be used. 

In step (3), for the logarithmic and arctangent penalty 
functions, one can use 

a„ = /3^, where < /3 < 1. (52) 

An 

When /3 = 0, the penalty function is simply the £i norm; in 
this case, the proposed method offers no advantage relative 
to £i norm penalized least squares (BPD/lasso). When /3 = 
1, the penalty function is maximally non-convex (maximally 
sparsity-inducing) subject to F being convex. We have used 
/3 = 0.9 in the examples below. 

The minimization of (36) in step (4) is a convex opti- 
mization problem for which numerous algorithms have been 
developed as noted in Sec. I-B. The most efficient algorithm 
depends primarily on the properties of H. 

E. Iterative MSC (IMSC) 

An apparent limitation of the proposed approach, MSC, is 
that for some problems of interest, the parameters r„ are either 
equal to zero or nearly equal to zero for all n e Zn, i.e., 
R « 0. In this case, the method requires that (f){- ;a„) be 
convex or practically convex. For example, for the logarithmic 
and arctangent penalty functions, r„ ~ leads to a„ ~ 0. As a 
consequence, the penalty function is practically the ii norm. 
In this case, the method offers no advantage in comparison 
with ii norm penahzed least squares (BPD/lasso). 

The situation wherein R « arises in two standard 
sparse signal processing problems: basis pursuit denoising and 
deconvolution. In deconvolution, if the system is non-invertible 
or nearly singular (i.e., the frequency response has a null 
or approximate null at one or more frequencies), then the 
optimal lower bound R will be R ~ 0. In BPD, the matrix 
H often represents the inverse of an overcomplete frame (or 
dictionary), in which case the optimal lower bound R is again 
close to zero. 

In order to broaden the applicability of MSC, we describe 
iterative MSC (IMSC) wherein MSC is applied several times. 



On each iteration, MSC is applied only to the non-zero 
elements of the sparse solution x obtained as a result of the 
previous iteration. Each iteration involves only those colunnns 
of H corresponding the previously identified non-zero com- 
ponents. As the number of active columns of H diminishes as 
the iterations progress, the problem (44) produces a sequence 
of increasingly positive diagonal matrix R. Hence, as the 
iterations progress, the penalty functions become increasingly 
non-convex. The procedure can be repeated until there is no 
change in the index set of non-zero elements. 

The IMSC algorithm is initiahzed with the £i norm solution, 
i.e., using (f){x, a„) = |a;| for all n e Z^r. (For the logarithmic 
and arctangent penalties, a„ = 0, n e Zjv .) We assume the 
£i norm solution is reasonably sparse; otherwise, sparsity is 
likely not useful for the problem at hand. The algorithm should 
be terminated when there is no change (or only insignificant 
change) between the active set from one iteration to the next. 

The IMSC procedure is described as follows, where i > 1 
denotes the iteration index. 

1) Initialization. Find the £i norm solution: 



are: mm y 



N-l 

E 

n=0 



Hx||2+ V A„|a;„|. (53) 



Set i = 1 and K^°^ = N. Note H is of size M x N. 

2) Identify the non-zero elements of x('\ and record their 
indices in the set /C'^*\ 



(54) 



This is the support of x^'^. Let K^^^ be the number of 
non-zero elements of i.e., X^*^ = |/C''*^|. 

3) Check the termination condition: If K'-'^^ is not less than 
K^^~^\ then terminate. The output is x('\ 

4) Define H*^*) as the sub-matrix of H containing only 
columns k € K^^ . The matrix H^*) is of size M x K'^\ 
Find a positive semidefinite diagonal matrix R^'^ lower 
bounding [H^'^J^H^'^ i.e., solve problem (44) or use 
a^'l I. 

R(^ is of size K^^ x K'^^ . 

5) Set a„ such that (A„/r4'\a„) & S, n & IC^'\ For 
example, with the logarithmic and arctangent penalties, 
one may set 



(55) 



for some < ^ < 1. 
6) Solve the K^'^^ dimensional convex problem: 



uW=arg min ||y-HWu||^+ V A„0(u„;aW). 

(56) 

7) Set x('+i) as 



8) Set i = i-\-\ and go to step 2). 



(57) 



□ 
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In the IMSC algorithm, the support of x^*) can only shrink 
from one iteration to the next, i.e., /C^*+^) C /C^*^ and 
^(«+i) ^ K^'^\ Once there is no further change in }C^^\ each 
subsequent iteration will produce exactly the same result, i.e.. 



(58) 



For this reason, the procedure should be terminated when 
/C^*^ ceases to shrink. In the ID sparse deconvolution example 
below, the IMSC procedure terminates after only three or four 
iterations. 

Note that the size of problem (44) in step 4) reduces in 
size as the algorithm progresses. Hence each instance of (44) 
requires less computation than the previous. More importantly, 
each matrix H^*+^^ has a subset of the columns of H*^*\ 
Hence, R('+^^ is less constrained than R('\ and the penalty 
functions become more non-convex (more strongly sparsity- 
inducing) as the iterations progress. Therefore, the IMSC 
algorithm produces a sequence of successively sparser x^'^ . 

Initializing the IMSC procedure with the £i norm solution 
substantially reduces the computational cost of the algorithm. 
Note that if the ii norm solution is sparse, i.e., K^^^ <C N, 
then all the semidefinite optimization problems (44) have far 
fewer variables than N, i.e., K'^''> ^ K^'^l Hence, IMSC 
can be appUed to larger data sets than would otherwise be 
computationally practical, given the computational cost of 
(44). 

F. Deconvolution Example 

A sparse signal x{n) of length = 1000 is generated so 
that (0 the inter-spike interval is uniform random between 5 
and 35 samples, and (//) the amplitude of each spike is uniform 
between —1 and 1. The signal is illustrated in Fig. 6. 

The spike signal is then used as the input to a Unear time- 
invariant (LTI) system, the output of which is contaminated 
by AWGN, w{n). The observed data, y{n), is written as 



where w{n) 



> b{k) x{n ■ 



k 



k) — a(k) y(n — k) + w{n) 



~ ^(0, c^). It can also be written as 
A-^Bx-|-w = Hx-Fw, H = A-^B 



where A and B are banded Toeplitz matrices [50]. In this 

example, we set 6(0) = 1, h(l) = 0.8, a(0) = 1, a(l) = 
-1.047, a(2) = 0.81, and a = 0.2. The observed data, y, is 
illustrated in Fig. 6. 

Several algorithms for estimating the sparse signal x will 
be compared. The estimated signal is denoted x. The accuracy 
of the estimation is quantified by the (.2 and (.1 norms of the 
error signal and by the support error, denoted L2E, LIE, and 
SE respectively. 

1) L2E = ||x- x||2 

2) LlE = llx-ijli 

3) SE = ||s(x)-s(x)||o 

The support error, SE, is computed using s(x), the e-support 
of x e K^. Namely, s : -s- {0, 1}^ is defined as 



S(x)]n 



\Xr, 



> e 



(59) 
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Fig. 6. Sparse deconvolution via sparsity penalized least squares. 

where e > is a small value to accommodate negligible non- 
zeros. We set e = 10~^. The support error, SE, counts both the 
false zeros and the false non-zeros of x. The numbers of false 
zeros and false non-zeros are denoted FZ and FN, respectively. 

First, the sparse £i norm solutions, i.e., (j){x,a) = \x\ 
in (36), with and without debiasing, are computed. We set 
A„ according to (51), i.e., A„ = 2.01, n € Zjv. The 
estimated signals are illustrated in Fig. 6. The errors L2E, 
LIE, and SE, are noted in the figure. As expected, debiasing 
substantially improves the L2E and LIE errors of the £i norm 
solution; however, it does not improve the support error, SE. In 
particular, debiasing does not make the solution more sparse. 
The errors, averaged over 200 trials, are shown in Table 1. 
Each trial consists of independently generated sparse and noise 
signals. 

Next, sparse deconvolution is performed using three algo- 
rithms developed to solve the highly non-convex quasi- 
norm problem, namely the Iterative Support Detection (ISD) 
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TABLE I 

Sparse deconvolution example. Average errors (200 trials). 



Algorithm 


L2E 


LIE 


SE 


(FZ, FN) 


£i norm 


1.443 


10.01 


37.60 


(10.3,27.3) 


£i norm + debiasing 


0.989 


7.14 


37.57 


(10.3,27.2) 


AIHT [6] 


1.073 


6.37 


24.90 


(12.4,12.5) 


ISD [56] 


0.911 


5.19 


19.67 


(11.6, 8.1) 


SBR [51] 


0.788 


4.05 


13.62 


(12.0, 1.6) 


epip = 0.7) IRL2 


0.993 


5.80 


16.32 


(12.9, 3.4) 


tp\p — \j. 1 ) iKLz + aeuiasing 


U.yZ4 


A Q1 

4.0Z 


10. oz 




epip = 0.7) IRLl 


0.884 


5.29 


14.43 


(11.5, 2.9) 


epip = 0.7) IRLl + debiasing 


0.774 


4.18 


14.43 


(11.5, 2.9) 


IMSC (log) 


0.896 


5.35 


19.50 


(10.4, 9.1) 


IMSC (log) + debiasing 


0.834 


5.00 


19.50 


(10.4, 9.1) 


IMSC (atan) 


0.794 


4.52 


16.93 


(10.5, 6.4) 


IMSC (atan) + debiasing 


0.790 


4.54 


16.92 


(10.5, 6.4) 


IMSC/S (atan) 


0.967 


5.94 


20.16 


(10.3, 9.9) 


IMSC/S (atan) + debiasing 


0.826 


5.02 


20.14 


(10.3, 9.9) 



algorithm [56], ^ the Accelerated Iterative Hard Thresholding 
(AIHT) algorithm [6], ^ and the Single Best Replacement 
(SBR) algorithm [51]. In each case, we used software by 
the respective authors. The ISD and SBR algorithms require 
regularization parameters p and A respectively; we found that 
p = 1.0 and A = 0.5 were approximately optimal. The AIHT 
algorithm requires the number of non-zeros be specified; we 
used the number of non-zeros in the true sparse signal. Each 
of ISD, AIHT, and SBR significantly improve the accuracy of 
the result in comparison with the £i norm solutions, with SBR 
being the most accurate. These algorithms essentially seek the 
correct support. They do not penalize the values in the detected 
support; so, debiasing does not alter the signals produced by 
these algorithms. 

The ip quasi-norm, with p = 0.7, i.e. <^(a;) = \x\^, 
also substantially improves upon the £i norm result. Several 
methods exist to minimize the cost function F in this case. We 
implement two methods: IRL2 and IRLl (iterative reweighted 
£2 and £1 norm minimization, respectively), with and without 
debiasing in each case. We used A = 1.0, which we found to 
be about optimal on average for this deconvolution problem. 
As revealed in Table 6, IRLl is more accurate than IRL2. 
Note that 1RL2 and IRLl seek to minimize exactly the same 
cost function; so the inferiority of IRL2 compared to IRLl is 
due to the convergence of IRL2 to a local minimizer of F. 
Also note that debiasing substantially improves L2E and LIE 
(with no effect on SE) for both IRL2 and IRLl. The ip results 
demonstrate both the value of a non-convex regularizer and the 
vulnerability of non-convex optimization to local minimizers. 

The results of the proposed iterative MSC (IMSC) algo- 
rithm, with and without debiasing, are shown in Table I. We 
used /3 = 0.9 and A„ = 2.01, n e Z^r, in accordance with 
(51). Results using the logarithmic (log) and arctangent (atan) 
penalty functions are tabulated, which show the improvement 
provided by the later penalty, in terms of L2E, LIE, and SE. 
While debiasing reduces the error (bias) of the logarithmic 
penalty, it has negligible effect on the arctangent penalty. The 
simplified form of the MSC algorithm, wherein R = Ofminl is 
used instead of the optimal R computed using SDP, is also 

^http://www.caam.rice.edu/%7Eoptimization/LI/ISD/ 
'http://users.fmrib.ox.ac.uli/%7Etblmnens/sparsify/sparsify.html 



tabulated in Table I, denoted by IMSC/S. IMSC/S is more 
computationally efficient than MSC due to the omission of 
SDP; however, it does lead to an increase in the error measures. 

The IMSC algorithm ran for three iterations on average. 
For example, the IMSC solution illustrated in Fig. 6 ran with 
ii'(i) = 61, = 40, and K<^^'> = 38. Therefore, even 

though the signal is of length 1000, the SDPs that had to be 
solved are much smaller: of sizes 61, 40, and 38, only. 

The optimality of the MSC solution at each stage can be 
verified using (45). Specifically, a scatter plot of [H^(y — 
Hx)]„/A„ verses a:„a„, for all n € /C^'\ should show all 
points lying on the graph of d(f){x, 1). For the IMSC solution 
illustrated in Fig. 6, this optimality scatter plot is illustrated in 
Fig. 7, which shows that all points he on sign(x)/(l-|-|a;|-|-a;^), 
hence verifying the optimality of the obtained solution. 

To more clearly compare the relative bias of the £1 norm 
and IMSC (atan) solutions, they are illustrated together in 
Fig. 7. Only the non-zero elements of each solution are shown. 
In this figure, the closer the points lie to the identity, the 
more accurate the solution. The figure shows that the £1 norm 
solution systematically underestimates the true values more so 
than does the MSC (atan) solution. 

In terms of L2E and LIE, the best IMSC result, i.e., IMSC 
(atan), is outperformed by SBR and the IRLl + debiasing 
algorithm. In addition, IMSC (atan) yields lower SE than £1 
minimization, AIHT, and ISD. IMSC does not yield the best 
error measures, but it comes reasonably close; even though, 
IMSC is based entirely on convex optimization. In terms of 
LIE and SE, the SBR performs best for this example. Most 
notably, SBR attains a small number of false non-zeros. 

Note that IMSC requires only the parameter /3 (with ^ 
/3 < 1) beyond those parameters (namely A„) required for the 
£i norm solution. 

Fig. 8 illustrates the average errors as functions of the reg- 
ularization parameter, for ISD, IMSC, and IMSC + debiasing 
(denoted IMSC-i-d in the figure). For IMSC, the regularization 
parameter is A. For ISD, the regularization parameter is 
p = A/2. Note that for IMSC, the value of A minimizing L2E 
and LIE depends on whether or not debiasing is performed. 
The value A suggested by (51) (i.e., A = 2) is reasonably 
effective with or without debiasing. The value of A minimizing 
SE is somewhat higher. 

The implementation of the li, IRL2, IRLl, and IMSC 
algorithms for deconvolution each require the solution of 
(36) with various penalty functions and/or sub-matrices of 
H. We have used algorithms, based on majorization of the 
penalty function, that exploit banded matrix structures for 
computational efficiencies [49], [50]. 

IV. Conclusion 

This paper proposes an approach (MSC) to obtain sparse 
solutions to ill-posed linear inverse problems. In order to 
induce sparsity more strongly than the £\ norm, the MSC 
approach utilizes non-convex penalty functions. However, the 
non-convex penalty functions are constrained so that the 
total cost function is convex. The maximally non-convex 
(maximally sparsity-inducing) constrained penalty functions 
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Fig. 7. Sparse deconvolution. (a) Illustration of optimality condition (45) for 
IMSC (atan) solution, (b) Comparison of £i norm and IMSC solutions. 



are found by formulating a semidefinite program (SDP). 
Iterative MSG (IMSC) consists of applying MSG to the non- 
zero (active) elements of the sparse solution produced by the 
previous iteration. 

The MSG method is intended as a convex alternative to 
£i norm minimization, which is widely used in sparse signal 
processing. Being based entirely on convex optimization, it 
can not be expected that MSG produces solutions as sparse 
as non-convex optimization methods, such as £p quasi-norm 
(0 < p < 1) minimization. However, it provides a principled 
approach for enhanced sparsity relative to the ii norm. More- 
over, although it is not explored here, it may be effective to 
use MSG in conjunction with other techniques. As has been 
recognized and illustrated in the sparse deconvolution example 
above, reweighted £i norm minimization can be more effective 
than reweighted £2 norm minimization (i.e., higher hkelihood 
of convergence to a global minimizer). Likewise, it will be 
of interest to explore the use of reweighted MSG or similar 
methods as a means of more rehable non-convex optimization. 
For example, a non-convex algorithm of the MM-type may 
be devised wherein a general non-convex penalty function 
is majorized by a non-convex function constrained so as to 
ensure convexity of the total cost function. 

To apply the proposed approach to large scale problems 
in practice, it will be necessary to have an algorithm for 
solving (44) which does not rely on accessing or manipulating 
individual rows or columns of H. 
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